120 research outputs found

    A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce

    Get PDF
    Nowadays many companies have available large amounts of raw, unstructured data. Among Big Data enabling technologies, a central place is held by the MapReduce framework and, in particular, by its open source implementation, Apache Hadoop. For cost effectiveness considerations, a common approach entails sharing server clusters among multiple users. The underlying infrastructure should provide every user with a fair share of computational resources, ensuring that Service Level Agreements (SLAs) are met and avoiding wastes. In this paper we consider two mathematical programming problems that model the optimal allocation of computational resources in a Hadoop 2.x cluster with the aim to develop new capacity allocation techniques that guarantee better performance in shared data centers. Our goal is to get a substantial reduction of power consumption while respecting the deadlines stated in the SLAs and avoiding penalties associated with job rejections. The core of this approach is a distributed algorithm for runtime capacity allocation, based on Game Theory models and techniques, that mimics the MapReduce dynamics by means of interacting players, namely the central Resource Manager and Class Managers

    Poli-RISPOSTA Mobile App

    Get PDF
    Poli-RISPOSTA mobile app is an Android application developed to support the data gathering step of the Poli-RISPOSTA procedure performed after a flood event. Its aim is to digitalize the current paper based approach used by the Italian Civil Protection. The proposed system aims to provide a consistent approach for collecting and storing data and to develop a long-term application solution by providing an environment for analyzing, managing, accessing and reusing information, objects and data. Despite the simplicity of the elements that cause floods, floods constitute a high risk to both rural and urban settlements. As more and more surface gets covered by urban settlement, grounds get less permeable resulting in more water unable to drip to the aquifer and submerging the surface. The main factor to push a growing risk of floods is the growing value of assets built on the lands. The reconstruction process that follow such a tragic event need to be planned and tailored on an analysis of the damages to infrastructures and proprieties. Collecting data after floods is necessary not only to verify the request for reimbursements or budget allocations but also to gather knowledge about what factors and artifacts of settlements constitute risk in floods. While specific expertise is needed to assess damages to complex infrastructures, inspection of residential properties can be more repetitive and need a systematic methodology to address the multitude of sites to be inspected and amount of data to be gathered, organized and analysed

    Service provisioning problem in cloud and multi-cloud systems

    Get PDF
    Cloud computing is a new emerging paradigm that aims to streamline the on-demand provisioning of resources as services, providing end users with flexible and scalable services accessible through the Internet on a pay-per-use basis. Because modern cloud systems operate in an open and dynamic world characterized by continuous changes, the development of efficient resource provisioning policies for cloud-based services becomes increasingly challenging. This paper aims to study the hourly basis service provisioning problem through a generalized Nash game model. We take the perspective of Software as a Service (SaaS) providers that want to minimize the costs associated with the virtual machine instances allocated in a multiple Infrastructures as a Service (IaaS) scenario while avoiding incurring penalties for execution failures and providing quality of service guarantees. SaaS providers compete and bid for the use of infrastructural resources, whereas the IaaSs want to maximize their revenues obtained providing virtualized resources. We propose a solution algorithm based on the best-reply dynamics, which is suitable for a distributed implementation. We demonstrate the effectiveness of our approach by performing numerical tests, considering multiple workloads and system configurations. Results show that our algorithm is scalable and provides significant cost savings with respect to alternative methods (5% on average but up to 260% for individual SaaS providers). Furthermore, varying the number of IaaS providers means an 8%-15% cost savings can be achieved from the workload distribution on multiple IaaSs

    A Multi Model Algorithm for the Cost Oriented Design of the Information Technology Infastructure

    Get PDF
    Multiple combinations of hardware and network components can be selected to design an information technology (IT) infrastructure that satisfies performance requirements. The professional criterion to deal with these degrees of freedom is cost minimization. However, a scientific approach has been rarely applied to cost minimization and a rigorous methodological support to cost issues of infrastructural design is still lacking. The methodological contribution of this paper is the representation of complex infrastructural design issues as a set of four intertwined cost-minimization sub-problems: two set-coverings, a set-packing and a min k-cut with a non linear objective function. Optimization is accomplished by sequentially solving all sub-problems with a heuristic approach and finally tuning the solution with a local-search approach. The methodology is empirically verified with a software tool including a database of costs that has also been built as part of this research. The work shows how an overall costminimization approach can provide significant savings and indicates how corresponding infrastructural design rules can substantially differ from the local optima previously identified by the professional literature

    Special issue on algorithms for the resource management of large scale infrastructures

    Get PDF
    Modern distributed systems are becoming increasingly complex as virtualization is being applied at both the levels of computing and networking. Consequently, the resource management of this infrastructure requires innovative and efficient solutions. This issue is further exacerbated by the unpredictable workload of modern applications and the need to limit the global energy consumption. The purpose of this special issue is to present recent advances and emerging solutions to address the challenge of resource management in the context of modern large-scale infrastructures. We believe that the four papers that we selected present an up-to-date view of the emerging trends, and the papers propose innovative solutions to support efficient and self-managing systems that are able to adapt, manage, and cope with changes derived from continually changing workload and application deployment settings, without the need for human supervision

    Context-aware Data Quality Assessment for Big Data

    Get PDF
    Big data changed the way in which we collect and analyze data. In particular, the amount of available information is constantly growing and organizations rely more and more on data analysis in order to achieve their competitive ad- vantage. However, such amount of data can create a real value only if combined with quality: good decisions and actions are the results of correct, reliable and complete data. In such a scenario, methods and techniques for the data quality assessment can support the identification of suitable data to process. If in tra- ditional database numerous assessment methods are proposed, in the big data scenario new algorithms have to be designed in order to deal with novel require- ments related to variety, volume and velocity issues. In particular, in this paper we highlight that dealing with heterogeneous sources requires an adaptive ap- proach able to trigger the suitable quality assessment methods on the basis of the data type and context in which data have to be used. Furthermore, we show that in some situations it is not possible to evaluate the quality of the entire dataset due to performance and time constraints. For this reason, we suggest to focus the data quality assessment only on a portion of the dataset and to take into account the consequent loss of accuracy by introducing a confidence factor as a measure of the reliability of the quality assessment procedure. We propose a methodology to build a data quality adapter module which selects the best configuration for the data quality assessment based on the user main require- ments: time minimization, confidence maximization, and budget minimization. Experiments are performed by considering real data gathered from a smart city case study
    • …
    corecore